POV-Ray: Newsgroups: povray.off-topic: "What every programmer should know about RAM": "What every programmer should know about RAM"

POV-Ray : Newsgroups : povray.off-topic : "What every programmer should know about RAM" : "What every programmer should know about RAM"		Server Time 2 Apr 2026 03:46:44 EDT (-0400)

From: Invisible
Date: 21 Apr 2008 11:51:26
Message: <480cb7fe$1@news.povray.org>

http://lwn.net/Articles/250967/

An interesting read, I thought.

[See what I did there?]

Certainly much of this document is pretty much news to me. Last time I 
looked into this stuff, the bus from the CPU to RAM ran at the same 
speed as the rest of the system, and if you had a 16-bit address space, 
you had 16 physical wires.

Also, last time I looked into this, CPUs ran at about 1 MHz. ;-)

I was aware that these days the RAM is about 20x slower than the CPU, 
and that's why the CPU has to have a cache the size of a small planet to 
achieve any performance whatsoever. But it's certainly news to me that 
these days the communication involves a complex *protocol* instead of 
just a direct passive electronic connection. Or that the refresh signal 
has to come from outside the memory subsystem. Now I see why a "memory 
controller" needs to exist.

And now they tell me that the latest designs use a serial link... this 
seems entirely counter-intuitive. Sure, a serial link can be driven at a 
higher clock rate more easily, but it would have to be *orders of 
magnitude* higher to overcome the inherant slowness of a serial link in 
the first place! Hmm, well I guess these guys must know what they're 
doing, even if it doesn't make any sense. ;-)

I must confess to being comfused about "fully associative" and "set 
associative".

The info on TLBs was quite interesting too. And all that NUMA stuff too. 
But you begin to wonder what us programmers can actually *do* about all 
this stuff.

Most especially, when the document talks about laying out data in memory 
in a certain way, and performing loops in a certain order to optimise 
cache usage, I can't help thinking that although that might be quite 
easy in C or C++, it's going to be 100% impossible in any moderately 
high-level language. (Can you spell "garbage collection"?)

Obviously, the idea of high-level languages is that you work at a higher 
level of abstraction. And that means, for example, that you might 
implement some O(log n) algorithm instead of an O(n^3) algorithm, 
[thereby making your code *millions* of times after], rather than 
fussing about cache hits and maybe gaining a few *hundred* percent speed 
increase. But even so...

Also, this explains some of the weird benchmark results I've been 
seeing. I wrote a merge-sort and made it run in parallel, and was rather 
perplexed to see that this is *slower* than a purely sequential 
algorithm. Now I begin to see why: sorting integers involves grabbing a 
pair of data values, performing a simple numerical comparison, and then 
moving them to somewhere else. I would imagine it's bounded by RAM 
bandwidth rather than CPU power. Maybe trying to run two sorts in 
parallel just saturates the FSB bandwidth twice as fast, so each CPU 
spends longer waiting around?

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*

Post a reply to this message